knitr::opts_chunk$set(echo = TRUE)

Get the necessary packages

First, start with installing the relevant packages.

## Indlæser krævet pakke: ggplot2
## Indlæser krævet pakke: lubridate
## 
## Vedhæfter pakke: 'lubridate'
## De følgende objekter er maskerede fra 'package:base':
## 
##     date, intersect, setdiff, union
## Indlæser krævet pakke: PerformanceAnalytics
## Indlæser krævet pakke: xts
## Indlæser krævet pakke: zoo
## 
## Vedhæfter pakke: 'zoo'
## De følgende objekter er maskerede fra 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Vedhæfter pakke: 'PerformanceAnalytics'
## Det følgende objekt er maskeret fra 'package:graphics':
## 
##     legend
## Indlæser krævet pakke: quantmod
## Indlæser krævet pakke: TTR
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## ══ Need to Learn tidyquant? ════════════════════════════════════════════════════
## Business Science offers a 1-hour course - Learning Lab #9: Performance Analysis & Portfolio Optimization with tidyquant!
## </> Learn more at: https://university.business-science.io/p/learning-labs-pro </>
## 
## Vedhæfter pakke: 'dplyr'
## De følgende objekter er maskerede fra 'package:xts':
## 
##     first, last
## De følgende objekter er maskerede fra 'package:stats':
## 
##     filter, lag
## De følgende objekter er maskerede fra 'package:base':
## 
##     intersect, setdiff, setequal, union
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ tibble  3.1.4     ✓ stringr 1.4.0
## ✓ tidyr   1.1.3     ✓ forcats 0.5.1
## ✓ purrr   0.3.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x lubridate::as.difftime() masks base::as.difftime()
## x lubridate::date()        masks base::date()
## x dplyr::filter()          masks stats::filter()
## x dplyr::first()           masks xts::first()
## x lubridate::intersect()   masks base::intersect()
## x dplyr::lag()             masks stats::lag()
## x dplyr::last()            masks xts::last()
## x lubridate::setdiff()     masks base::setdiff()
## x lubridate::union()       masks base::union()

Here we read in the data set and put it in a data frame.

df <- read_csv("Africa_19_20.csv")
## Rows: 90707 Columns: 31
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (18): event_id_cnty, event_date, event_type, sub_event_type, actor1, ass...
## dbl (13): data_id, iso, event_id_no_cnty, year, time_precision, inter1, inte...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df2 <- read_xls("Scrape-Africa.xls")

There are some irrelevant columns that we want to remove. Then we save the cleaned data in a new data frame. By doing this we are not tampering with the original data for transparency.

clean <- c("iso", "iso3", "event_id_no_city", "event_id_no_cnty", "event_id_cnty", "time_precision", "inter1", "inter2", "interaction", "geo_precision")

cleandata <- df[,!(names(df) %in% clean)]

We want to have the date in our data frame listed as a date:

cleandata <- cleandata %>% 
  mutate(event_date = dmy(event_date))

We don’t want to include the year 2018:

cleandata <- cleandata %>%
  filter(year > 2018)

Now that the data is clean, we can start using it. Firstly, we want to get an overview of the events in Africa, we want to make an interactive map showing the events geographically.

Here we install the neccesary package “leaflet”.

#install.packages("leaflet")
library(leaflet)
## 
## Vedhæfter pakke: 'leaflet'
## Det følgende objekt er maskeret fra 'package:xts':
## 
##     addLegend

Here we specify which parameters to include when clicking in a specific event.

cleandata %>% 
mutate(content = paste0('<b>Event type:</b> ', event_type, '<br>', '<b>Fatalities: </b>', fatalities, '<br>', '<b>Event date:</b> ', event_type, '<br>', '<b>Country:</b> ', country, '<br>')) -> cleandata

Here we specify how the map will take the latitude and longitude of the data set and make clusters if the events are close to each other, depending on how much you zoom in/out.

cleandata %>% 
  leaflet() %>%
  addTiles() %>%
  addMarkers(~longitude, ~latitude, label = ~event_type, popup = ~content, clusterOptions = markerClusterOptions()) 
cleandata
## # A tibble: 88,306 × 23
##    data_id event_date  year event_type  sub_event_type  actor1    assoc_actor_1 
##      <dbl> <date>     <dbl> <chr>       <chr>           <chr>     <chr>         
##  1 8662954 2021-11-26  2021 Violence a… Attack          JNIM: Gr… <NA>          
##  2 8662955 2021-11-26  2021 Strategic … Looting/proper… Katiba M… JNIM: Group f…
##  3 8662956 2021-11-26  2021 Battles     Armed clash     JNIM: Gr… <NA>          
##  4 8662996 2021-11-26  2021 Protests    Protest with i… Proteste… #FixTheCountr…
##  5 8663006 2021-11-26  2021 Riots       Violent demons… Rioters … Former UTM: U…
##  6 8663017 2021-11-26  2021 Battles     Armed clash     Katiba G… JNIM: Group f…
##  7 8663030 2021-11-26  2021 Protests    Peaceful prote… Proteste… Labour Group …
##  8 8663080 2021-11-26  2021 Violence a… Attack          Unidenti… <NA>          
##  9 8663081 2021-11-26  2021 Strategic … Agreement       Zamfara … <NA>          
## 10 8663087 2021-11-26  2021 Explosions… Remote explosi… JNIM: Gr… <NA>          
## # … with 88,296 more rows, and 16 more variables: actor2 <chr>,
## #   assoc_actor_2 <chr>, region <chr>, country <chr>, admin1 <chr>,
## #   admin2 <chr>, admin3 <chr>, location <chr>, latitude <dbl>,
## #   longitude <dbl>, source <chr>, source_scale <chr>, notes <chr>,
## #   fatalities <dbl>, timestamp <dbl>, content <chr>

Here we want to know the number of events through the period. Therefore, we create a new data frame where we take the sum of events per day. Again, the date must be coerced and the count column values had to be coerced to numeric.

countingafrica <- table(cleandata$event_date)
datecount <- setNames(data.frame(table(cleandata$event_date)),c("date", "count"))

datecount <- datecount %>% 
  mutate(date = ymd(date))

as.numeric(datecount$count)

Now that we have made this data frame, we can make a plot using ggplot2. The blue line indicates the mean, taken from 60 days intervals.

# Plot
datecount %>%
  ggplot(aes(x=date, y=count)) +
    geom_line() +
    ggtitle("Conflicts in Africa 2019-2021") +
    ylab("Events") +
    scale_x_date(date_breaks = "6 months",
                 date_labels = ("%b %y")) +
  geom_ma(ma_fun = SMA, n = 60, color = "blue", size = 1)

Here we make a data frame listing the total number of events per country.

countingcountry <- table(cleandata$country)
countofcountry <- setNames(data.frame(table(cleandata$country)),c("Country","Count"))

Here we make a data frame listing the total number of fatalities per country and renaming the column.

fata <- aggregate(fatalities~country,cleandata,sum)
names(fata)[2] <- "Fatalities"

Here we combine the above two data frames to “countrydata” and creates an additional two columns which we bind to countrydata.

# Combine the above two data frames
countrydata <- cbind(countofcountry, fata$Fatalities, df2$`gdp-per-capita`, df2$`political-stability-index`, df2$`hdi-value`)

# Renaming the columns
names(countrydata)[names(countrydata) == 'fata$Fatalities'] <- 'fatalities'
names(countrydata)[names(countrydata) == 'df2$`gdp-per-capita`'] <- 'GDP-per-capita'
names(countrydata)[names(countrydata) == 'df2$`political-stability-index`'] <- 'political_stability_index'
names(countrydata)[names(countrydata) == 'df2$`hdi-value`'] <- 'HDI-value'
names(countrydata)[names(countrydata) == 'Count'] <- 'number_of_events'

countrydata[is.na(countrydata)] <- 0

population <- c(43851044, 32866272, 12123200, 2351627, 20903273, 11890784, 26545863, 555987, 4829767, 16425864, 869601, 89561403, 988000, 102334404, 1402985, 3546421, 1160164, 114963588, 2225734, 2416668, 31072940, 13132795, 1968001, 26378274, 53771296, 2142249, 5057681, 6871292, 27691018, 19129952, 20250833, 4649658, 1271768, 272815, 36910560, 31255435, 2540905, 24206644, 206139589, 5518087, 895312, 12952218, 6077, 219159, 16743927, 98347, 7976983, 15893222, 59308690, 11193725, 43849260, 59734218, 8278724, 11818619, 45741007, 18383955, 14862924)

countrydata <- cbind(countrydata, population)

countrydata$event_index <- with(countrydata, number_of_events / population)

countrydata$fatalities_index <- with(countrydata, fatalities / number_of_events)

countrydata$political_stability_index <- as.numeric(countrydata$political_stability_index)

countrydata$`HDI-value` <- as.numeric(countrydata$`HDI-value`)
# Plot for events per country
plotevents <- countrydata %>%
  ggplot(aes(x=number_of_events, y=reorder(Country, number_of_events))) +
    geom_bar(stat = "identity") +
  ylab("Country") +
    ggtitle("Number of events per country in 2019-2021 - Africa") +
  xlab("Number of events")
# Plot for fatalities per country
plotfata <- countrydata %>%
  ggplot(aes(x=fatalities, y=reorder(Country, fatalities))) +
    geom_bar(stat = "identity") +
  ylab("Country") +
    ggtitle("Number of fatalities per country in 2019-2021 - Africa") +
  xlab("Number of fatalities")
eventsfata <- ggarrange(plotevents, plotfata + rremove("x.text"),
          labels = c("A", "B"),
          ncol = 2, nrow = 1)
# Plot for fatalities_index per country
countrydata %>%
  ggplot(aes(x=fatalities_index, y=reorder(Country, fatalities_index))) +
    geom_bar(stat = "identity") +
  ylab("Country") +
    ggtitle("Fatalities index per country in 2019-2021 - Africa") +
  xlab("Fatalities index")

# Plot for events per country
plotevents <- countrydata %>%
  ggplot(aes(x=number_of_events, y= Country)) +
    geom_bar(stat = "identity") +
  ylab("Country") +
    ggtitle("Events per country in 2019-2021 - Africa") +
  xlab("Number of events")
# Plot for population per country
plotpop <- countrydata %>%
  ggplot(aes(x=population, y= Country)) +
    geom_bar(stat = "identity") +
  ylab("Country") +
    ggtitle("Population per country in 2019-2021 - Africa") +
  xlab("Population")
ggarrange(plotevents, plotpop + rremove("x.text"), 
          labels = c("A", "B"),
          ncol = 2, nrow = 1)

# Plot for GDP per country
plot1 <- countrydata %>%
  ggplot(aes(x=`GDP-per-capita`, y=reorder(Country, -`GDP-per-capita`, y = `GDP-per-capita`))) +
    geom_bar(stat = "identity") +
  ylab("Country") +
    ggtitle("GDP per capita per country in 2019-2021 - Africa") +
  xlab("GDP per capita")
# Plot for political stability per country
plot2 <- countrydata %>%
  ggplot(aes(x=political_stability_index, y=reorder(Country, -political_stability_index, y = political_stability_index))) +
    geom_bar(stat = "identity") +
  ylab("Country") +
    ggtitle("Political stability per country in 2019-2021 - Africa") +
  xlab("Political stability index")
# Plot for HDI value per country
plot3 <- countrydata %>%
  ggplot(aes(x=`HDI-value`, y=reorder(Country, -`HDI-value`, y = `HDI-value`))) +
    geom_bar(stat = "identity") +
  ylab("Country") +
    ggtitle("HDI value per country in 2019-2021 - Africa") +
  xlab("HDI value")
# Plot for events per country
plot4 <- countrydata %>%
  ggplot(aes(x=number_of_events, y=reorder(Country, number_of_events))) +
    geom_bar(stat = "identity") +
  ylab("Country") +
    ggtitle("Events per country in 2019-2021 - Africa") +
  xlab("Events")

Here we combine the 4 plots above for better comparison.

fourgraphs <- ggarrange(plot1, plot2, plot3, plot4 + rremove("x.text"),
          labels = c("A", "B", "C", "D"),
          ncol = 2, nrow = 2)
annotate_figure(fourgraphs, top = text_grob("Figure 1", face = "bold", size = 25))

From the plots above we can see that Tunisias event index is high despite having a high HDI value and GDP per capita. This can be interpreted as out of the norm which is why we want to take a closer at Tunisia. Therefore we create a data frame for Tunisia showing events per day.

tunisia <- filter(cleandata, country == "Tunisia")

countingtunisia <- table(tunisia$event_date)
datecounttunisia <- setNames(data.frame(table(tunisia$event_date)),c("date", "count"))

datecounttunisia <- datecounttunisia %>% 
  mutate(date = ymd(date))

as.numeric(datecounttunisia$count)

Now that we have made this data frame, we can make a plot using ggplot2. The blue line indicates the mean, taken from 60 days intervals.

# Plot
datecounttunisia %>%
  ggplot(aes(x=date, y=count)) +
    geom_line() +
    ggtitle("Conflicts in Tunisia 2019-2021") +
    ylab("Events") +
    scale_x_date(date_breaks = "6 months",
                 date_labels = ("%b %y")) +
  geom_ma(ma_fun = SMA, n = 60, color = "blue", size = 1)